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Abstract 

Steady progress in our understanding of the genetic basis of autoinflammatory diseases has been made over the 
past 16 years. Since the discovery of the familial Mediterranean fever gene MEFV (also known as marenostrin) in 
1997, 18 other genes responsible for monogenic autoinflammatory diseases have been identified to date. The 
discovery of these genes was made through the utilisation of many genetic mapping techniques, including next 
generation sequencing platforms. This review article clearly describes the gene hunting approaches, methods of 
data analysis and the technological platforms used, which has relevance to all those working within the field of 
gene discovery for Mendelian disorders. 
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Introduction 

The concept of autoinflammatory diseases was first 
proposed by McDermott and colleagues 15 years ago 
upon the elucidation of the genetic cause of TNF 
Receptor Associated Periodic Syndrome (TRAPS), the 
second of the prototypic periodic fevers; following just 
two years after the identification of MEFV the Familial 
Mediterranean Fever (FMF) gene. They used the term 
'autoinflammatory' to distinguish the fact that autoanti- 
bodies were conspicuously absent and the innate 
immune response seemed to predominate in the patho- 
genesis [1]. There are two factors which indicate 
autoinflammatory disease: abnormally increased inflam- 
mation, mediated predominantly by the cells and mole- 
cules of the innate immune system, and significant host 
predisposition (monogenic or polygenic) [2]. 

Infevers, an online database of known mutations [3] 
currently recognises 19 monogenic autoinflammatory di- 
seases where genes have been identified and mutations are 
curated. Currently in the UK, six genes can be screened 
routinely {MEFV, NLRP3 and 12, MVK, TNFRSF1A, and 
N0D2); others are sent to reference and research labs 
around the world. However, as many as 60% of patients 
with autoinflammatory diseases do not fit with the known 
syndromes and/or screening for known genes is negative 
[2]. New syndromes are still being reported. In the last 
two years there has been the identification of genes for six 
monogenic autoinflammatory diseases. In addition, certain 
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polygenic diseases previously considered as autoimmune 
are now being considered for reclassification as autoin- 
flammatory, for example Behcet's disease, systemic juve- 
nile idiopathic arthritis (sJIA), ankylosing spondylitis, 
Crohn's disease, psoriasis and more [4]. 

With the exception of the cryopyrinopathies, where IL-1 
blockade has revolutionised treatment, and FMF where 
colchicine is a safe and effective treatment for most, there 
is no ideal consensus treatment for many emerging new 
autoinflammatory diseases. As the aetiopathogenesis re- 
mains unknown for many of these conditions, treatment 
relies on non-specific immunosuppression with cortico- 
steroids and/or other empiric trials of immunosuppres- 
sants including biologies, sometimes with significant side 
effects and cost. In extremely severe conditions allogeneic 
stem cell transplantation may be offered, but this is risky, 
and if pathogenesis is unknown there is no guarantee that 
it will be effective particularly for conditions associated 
with mutations not restricted to the haematopoietic sys- 
tem. Undoubtedly, identifying causative genes and path- 
ways will lead to better targeted treatment. It will also 
improve diagnostic screening for other affected family 
members or unrelated patients with unclassified autoin- 
flammatory diseases. Additionally, identification of novel 
mutations causing autoinflammation will enable genetic 
counselling for the family. Thus searching for novel ge- 
netic causes of autoinflammatory disease is both beneficial 
to patients, as well as advancing our knowledge of these 
diseases. 
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Review 

The decision of which genes to screen for mutations can 
be based on: the function of the gene (candidate gene 
approach), location in the genome following mapping 
(linkage analysis and/or homozygosity mapping), or a 
combination of both. New next-generation sequencing 
technologies now permit the screening of every gene at 
once by sequencing whole exomes and even whole ge- 
nomes. The history of the discovery of the 19 known 
monogenic causes of autoinflammation provides exam- 
ples of the various strategies for the discovery of mono- 
genic autoinflammatory diseases, as well as telling the 
story of the continued evolution in genetic techniques. 
Genes identified thus far, and the methods used for their 
discovery are summarised in Table 1. Gene discovery in 
polygenic disease is more complex, as each genetic 



variant confers only a small increase in susceptibility. 
Whilst this review predominantly focuses on monogenic 
disease, association studies in polygenic disease will also 
be touched upon. 

Candidate gene approach 

Sometimes a whole genome candidate gene approach is 
taken without prior mapping. NLRP12 was identified 
from a whole genome candidate gene approach; Jeru and 
colleagues decided to screen NLRP genes in periodic 
fever patients, as NLRP proteins are involved in the re- 
cognition of microbial molecules and the activation of 
immune responses [25]. Similarly, after observing a posi- 
tive response to anakinra (the IL-1 receptor antagonist) 
in a family with a neonatal onset pyogenic disorder, 
Aksentijevich and colleagues decided to screen genes in 



Table 1 Summary of the genes identified to be mutated in monogenic autoinflammatory diseases, the year they were 
first recognised and mapping techniques which contributed to this 



Year 


Disease 


Gene 


Methods 


1997 


Familial Mediterranean Fever (FMF) 


MEFV 


Multipoint and two point linkage with RFLPs and microsatellites 
Positional cloning [5,6]. 


1999 


TNF-Receptor Associated Periodic Syndrome 
CTRAPS) 


TNFRSF1A 


Multipoint linkage, two point linkage [1], Multipoint and two-point 
inkage and haplotype analysis [7]. 


1999 


Hyper IgD Syndrome (HIDS) 


MVK 


Pairwise Linkage with microsatellites [8]. 


2001 


Cryopyrin Associated Periodic Syndromes (CAPS) 


NLRP3 (CIAS1) 


Microsatellite multipoint and two-point linkage, and haplotype analysis 
in MWS [9] and FCAS [10]; candidate gene CINCA [11], 


2001 


Blau Syndrome 


NOD2 


Microsatellites, multi and two-point linkage and haplotype analysis 
[12]. Candidate gene [13]. 


2001 


Cherubism 


SH3BP2 


Microsatellite pairwise and multipoint linkage, haplotype analysis 
[14-16], Candidate gene [14]. 


2002 


Pyogenic sterile Arthritis, Pyoderma gangrenosum 
and Acne (PAPA) 


PSTPIP1 


Microsatellite haplotype analysis and candidate gene [17-19]. 


2004 


Early Onset Sarcoidosis (EOS) 


NOD2 


Candidate gene [20,21]. 


2005 


Majeed Syndrome 


LPIN2 


Microsatellites, homozygosity mapping, two and multipoint linkage 
and haplotype analysis [22]. 


2006 


Recurrent Hydatidiform Mole 1 (RHM1) 


NLRP7 


Microsatellites, multipoint linkage and haplotype analysis. Candidate 
gene [23,24], 


2008 


Familial Cold Autoinflammatory Syndrome 2 
(FCAS2) 


NLRPI2 


Candidate gene [25]. 


2009 


Deficiency of IL-1 Receptor Antagonist (DIRA) 


IL1RN 


Candidate gene [26]. 


2009 


Severe infantile inflammatory bowel disease 


ILWRA, 1 LI ORB, 

IL10 


Haplotype analysis and multipoint linkage with microsatellites and SNP 
arrays [27]. Candidate gene [28]. 


2011 


CANDLE/JMP/NNS 


P5MB8 


SNP homozygosity mapping, parametric multipoint linkage and 
[29-32] Exome analysis [31,32]. 


2011 


Deficiency of IL36 Receptor Antagonist (DITRA) 


IL36RN 


SNP array based homozygosity mapping then multipoint linkage and 
haplotype analysis with microsatellites [33]. Exome sequencing [34]. 


2012 


Autoinflammation & PLCy2-associated antibody 
deficiency & immune dysregulation (APLAID) 


PLCG2 


Exome sequencing [35]. 


2012 


HOIL1 Deficiency 


RBCKI (HOI LI) 


SNP based deletion screening and exome sequencing [36]. 


2013 


Pustular psoriasis/ pityriasis rubra pilaris 


CARD 14 


SNP multipoint linkage, microsatellite and RFLP haplotype analysis, 
targeted exome and candidate gene sequencing [37]. Exome and 
targeted-capture sequencing [38], 
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the IL-1 pathway and found mutations in IL1RN in 6 
families [26]. Sometimes a candidate gene approach is 
applied to a smaller region after genetic mapping, see 
below. What is evident from these examples is that 
insight into the innate immune system and the patho- 
genesis of known inflammatory disease is useful to in- 
form candidate gene identification in novel diseases. 

Genetic mapping 

In human genetics, linkage analysis has traditionally 
been conducted as the first step in mapping a trait. The 
majority of studies have used a combination of: two- 
point linkage analysis, multipoint linkage analysis, haplo- 
type analysis, and in some cases homozygosity mapping. 

Two-point/pairwise linkage: considers each marker in- 
dependently calculating the log of odds (LOD) score that 
each is linked to the disease locus. This method is par- 
ticularly useful for identifying which of a number of 
highly polymorphic markers are closest to the mutated 
gene, and thus is very useful in restriction fragment 
length polymorphism (RFLP) and microsatellite-based 
mapping (see below). 

Multipoint linkage: makes use of a genetic map, where 
the order of markers and the distance between them is 
specified. The probability that the disease locus lies at 
each point between these markers is then calculated 
sequentially. 

Haplotype Analysis: involves analysing each pedigree in 
turn; considering the order of the markers along each 
chromosome and reconstructing the transmission of 
each allele across each generation and thus inferring the 
recombination events between markers to narrow down 
the disease interval. The explicit genotype at each 
marker which has the correct heritage can additionally 
be compared between unrelated samples, as if they are 
identical this may indicate they have the same origin 
(founder effect). 

Homozygosity mapping: is applied to consanguineous 
families, and in considering the presumed autosomal re- 
cessive inheritance of the disease, identifies regions of 
the genome which exhibit this inheritance pattern. 

A requirement of all these techniques is that uniquely 
identifiable markers are spaced throughout the genetic 
region of interest (be it the whole genome, chromosome, 
or smaller candidate region) which are then tested for 
proximity of the disease locus depending on which 
method is being applied. These markers have evolved 
over the last 15 years, which is reflected in the gene- 
finding approaches taken in autoinflammatory disease 
studies as described below. 



Restriction fragment length polymorphisms (RFLPs) 

These are genetic markers where DNA is digested using 
restriction endonucleases before separation by gel elec- 
trophoresis and detection with a probe via Southern 
blot. A number of linkage studies in medium sized co- 
horts of families with Familial Mediterranean Fever 
(FMF) in the early 1990s were based on RFLP markers, 
identifying the short arm of chromosome 16 to harbour 
the afflicted gene for FMF in several populations [39-42] 
(and two studies based on microsatellites, below [5,6]), 
before the gene was subsequently identified [6,43]. 

Microsatellites 

The development of microsatellite markers allowed for 
the faster typing of a greater number of markers whilst 
consuming less DNA. Microsatellites are tandem repeats 
where the repeating unit consists of 2-6 bases and the 
repeat array covers between 10-1000bps. They can be 
amplified by PCR with primers in the unique flanking 
regions and detected by gel electrophoresis, or using 
fluorescent primers using an ABI prism genetic analyser 
allowing more markers to be detected at once. This type 
of genotyping has been at the core of mapping for a 
number of autoinflammatory diseases, including the four 
most common hereditary recurrent fevers. 

McDermott et al. conducted multipoint and two-point 
linkage analysis using microsatellite markers in a large 
Irish-Scottish kindred and two additional Irish kindreds 
suffering from dominant periodic fevers, and identified 
an 8cM region on chromosome 12pl3. Having observed 
a lower level of TNF receptor in the blood of patients 
they screened TNFSFR1 as a candidate gene within that 
region [44]. This same region had also just been identi- 
fied through multipoint linkage, two point linkage, and 
haplotype analysis in an Australian kindred of Scottish 
descent [7]. 

In 1999 pairwise linkage analysis was also central to 
the identification of the region harbouring the gene for 
Hyper IgD syndrome, conducted based on microsatellite 
markers in 13 families and then a candidate gene 
approach identifying the gene for mevalonate kinase 
(MVK) [8]. 

Two-point linkage, multipoint linkage and haplotype 
analysis on microsatellites were employed in the discov- 
ery of the gene NLRP3 mutated in the Cryopy- 
rinopathies. The fact that linkage in familial cold 
autoinflammatory syndrome (FCAS) and Muckle-Wells 
Syndrome (MWS) identified the same region, lq44, 
highlighted that these were two diseases within a 
spectrum [9,10]. Neonatal-onset multisystem inflamma- 
tory disease (NOMID), also known as the chronic in- 
fantile neurological cutaneous and articular syndrome 
(CINCA), was later identified as the third condition to 
be caused by mutations in this gene after linkage studies 
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triggered Feldman et al to consider NLRP3 as a candi- 
date gene [11]. 

Single nucleotide polymorphisms (SNPs) 

SNP arrays have further increased the number of markers 
which can be measured, though they are biallelic meaning 
heterozygosity is low, therefore a multipoint approach to 
linkage considering a greater number of markers at once 
has become critical. Genomic DNA is fragmented and 
hybridised to the array, which has DNA oligomers com- 
plementary to the sequence adjacent to the SNP. The 
alternate SNP alleles either affect the binding to either the 
oligomer and attached probe (Affymetrix arrays) or bin- 
ding to a subsequent probe after single base pair (bp) 
extension (Illumina arrays). Consequently, this results in 
differential fluorescence of the probe which is detected 
when the array is scanned and allows for the base allele to 
be called. 

PSMB8 has been identified by three different groups 
looking at patients who were classified as having three 
differently named diseases. Agarwal et al studied patients 
with joint contractures, muscle atrophy, microcytic an- 
aemia, and panniculitis-induced lipodystrophy syndrome 
(JMP) using homozygosity mapping and parametric 
multipoint linkage based on SNP genotypes from the 
Illumina HumanOmnil-Quad Beadchip [29]. Arima et al 
performed homozygosity mapping in five unrelated pa- 
tients and three unaffected siblings of one of the patients 
with Nakajo-Nishimura Syndrome (NNS) using Affy- 
metrix GeneChip Human Mapping 500K array set [30]; 
whilst Kitamura et al used homozygosity mapping and 
multipoint linkage to study Japanese autoinflammatory 
syndrome with lipodystrophy (JASL) patients and un- 
affected siblings from 2 consanguineous families geno- 
typed on the Illumina Human 610 Quad combined with 
exome sequencing [31]. They found that both NNS and 
JASL was caused by the Gly201Val substitution (though 
Kitamura et al. refer to this as Glyl97Val based on a dif- 
ferent transcript of the gene) suggesting a founder effect. 
Meanwhile, Liu et al used a combination of homozygosity 
mapping with SNP genotypes from Affymetrix GeneChip 
Human Mapping 250K SNP Array, candidate gene se- 
quencing and in one patient performed exome sequencing 
to identify PSMB8 in Chronic Atypical Neutrophilic 
Dermatosis with Lipodystrophy and Elevated Temperature 
Syndrome (CANDLE) patients. This is a further ex- 
ample of how mapping can support whole exome se- 
quencing [32]. 

SNP arrays have also enabled genome-wide association 
studies (GWAS) in polygenic diseases. Association stu- 
dies employ a case-control experimental design, where 
DNA markers of a sample of unrelated individuals are 
measured to search for correlations with phenotype. For 
each marker the analysis is a simple 2x2 \ test. GWAS 



studies use high through-put platforms such as SNP 
arrays to detect if any of them are statistically associated 
with the disease. However, as each marker counts as an 
additional independent test, compensations must be 
made for this multiple testing and to surmount this 
threshold the number of cases and controls needs to be 
very large. Furthermore, the analysis may be perturbed 
by underlying population structures and ethnicities due 
to differences in allele frequencies. This must be care- 
fully controlled or factored into the analysis. Once an 
association has been identified then there are rounds of 
further tests to confirm the putative association. Many 
of these studies use a course map of markers and then 
try to replicate the result in a further sample with more 
markers in the region. A number of GWAS have been 
undertaken in both Behcet's disease (reviewed in [45]) 
and in Crohn's disease [46]. 

Deep sequencing based techniques - whole genome, 
whole exome and targeted capture sequencing 

Whole genome sequencing takes the total genomic 
DNA from an individual and after preparing a DNA 
'library! sequencing the entire genome. In targeted 
capture, including whole exome sequencing, only the 
genomic regions of interest (e.g. all exons), are 'captured' 
from the DNA library. This is done by hybridising the 
library to complementary DNA or RNA oligos and wa- 
shing the unbound DNA away before eluting the bound 
DNA for sequencing. Exome capture kits can be bought 
'off the shelf; three of the biggest retailers of exome cap- 
ture kits are Nimblegen, Agilent and Illumina, and the 
exact constitution of the exome depends on the pro- 
prietary composition of their capture oligos. Sequencing 
is then performed on one of the high throughput next 
generation sequencing (NGS) platforms - including 
Illumina's Solexa sequencing by synthesis technologies, 
SOLiD sequencing by ligation, 454 pyrosequencing and 
Ion Torrent semiconductor technology. 

Following sequencing are several bioinformatic pro- 
cesses including: aligning sequencing reads to the refe- 
rence genome; variant calling which identifies deviations 
from the reference sequence; and annotation of variants. 
Annotation is the process of marking any variants 
found with information such as nearest gene, predicted 
consequences at protein level, and frequency in variant 
databases such as 1000 Genomes Project and SNP data- 
base (dbSNP) to help to eliminate common polymor- 
phisms (occurring in >1 % of the healthy population). 
Based on these factors, certain variants may be selected 
as 'candidates' for follow-up study with further insight into 
how the gene functions and whether the suspected variant 
is likely to interfere with this. Bioinformatic programs 
which may help with predicting consequences at protein 
level include PolyPhen [47] and MutationTaster [48]. 
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For whole exome and whole genome sequencing this 
process can identify thousands of non-pathogenic va- 
riants per person. Sequencing multiple unrelated indi- 
viduals is one solution to filter out non-pathogenic 
variants. Onoufriadis et al sequenced the exomes of 5 
unrelated Europeans with generalised pustular psoriasis. 
In two of them they found the same homozygous muta- 
tion and a third was found to be a compound hetero- 
zygote in IL36RN [34], the gene now known to be 
associated with the autoinflammatory disease called 
deficiency of the interleukin 36 receptor antagonist 
(DITRA). Concurrent studies by another group used 
both homozygosity mapping combined with next gene- 
ration sequencing, again leading to the identification of 
IL36RN, and emphasising the utility of combining these 
approaches for gene discovery [33]. 

Identifying the causal variant can be an enormous task 
if multiple unrelated patients are unavailable. For each 
variant the inheritance model can be considered, for 
example, if the disease is recessive is the variant homo- 
zygous? Runs of homozygous variants can also be iden- 
tified effectively for homozygosity mapping [49]. The 
number of heterozygous variants from a sequenced 
exome which have the correct inheritance for autosomal 
dominant disease is more numerous than when consi- 
dering homozygous or compound heterozygous variants 
for a recessive disease. Sequencing other family mem- 
bers can be informative to reconstruct the inheritance of 
variants. Using Agilent Sureselect 50Mb exome capture 
and SOLiD sequencing in a trio (an affected father and 
daughter and unaffected mother) enabled the iden- 
tification of the PLCG2 as being the gene mutated in 
Autoinflammatory PLCG2-associated antibody deficiency 
and immune dysregulation (APLAID) syndrome, an auto- 
somal dominant disease [35]. 

Very rare SNP variants can be determined from deep 
sequencing, which may be an asset for genome-wide as- 
sociation studies of polygenic complex diseases [50]. 
This approach could be beneficial to polygenic autoin- 
flammatory diseases such as periodic fever, aphthous 
ulceration, pharyngitis, and adenitis syndrome (PFAPA), 
Behcet's disease, and has been applied in a small cohort 
in Crohn's disease [51]. However, this is computationally 
complex and requires the typing of many samples from 
large patient cohorts to be sufficiently powered to detect 
genetic associations. 

Increased sensitivity of the NGS technologies is also 
enabling detection of mutations in known genes present 
in a minority of cells, i.e. for the detection of somatic 
mosaicism. Approximately 50% of CINCA patients who 
have the classic features have no detected mutation in 
NLRP3, so-called "mutation negative CAPS". A signifi- 
cant proportion of these patients may have somatic 
mosaicism in their leucocytes causing their disease, but 



at a percentage of affected cells which is too low to be 
detected by conventional Sanger sequencing. Sensitivity 
of detection can be increased by the process of cloning 
into vectors, as these cells make up a small percentage 
(4.2-38.5%) of the total leucocyte population [52]. Alter- 
natively, next generation sequencing technologies may 
detect low levels of somatic mosaicism and could be- 
come part of routine genetic screening in the future for 
CAPS, since the NLRP3 and other related genes can be 
covered with sufficient read depth [Omoyinmi et al. , 
manuscript submitted]. 

Conclusion: clinical application and ethical considerations 

We are in the midst of a revolution in diagnostic gene- 
tics in the field of autoinflammation. NGS technologies 
now allow rapid detection of novel genes causing auto- 
inflammation in selected patients. In addition, tech- 
niques such as whole exome sequencing are increasingly 
being used in routine clinical practice as diagnostic tools 
to rapidly screen many genes, particularly for diseases 
where there is a phenotypic overlap and many possible 
genetic causes. In particular, carbohydrate deficient 
glycoprotein syndromes, and Charcot-Marie-Tooth di- 
sease are specific examples where whole exome sequen- 
cing allows rapid screening of many genes with greater 
efficiency and less cost than conventional Sanger se- 
quencing [53]. Clearly, autoinflammatory diseases would 
lend themselves well to this sort of diagnostic approach 
since it is likely that in the future tens or even hundreds 
of genes will be implicated either singly or in com- 
bination as the cause of autoinflammation in certain 
patients. NGS technologies also bring new scientific 
challenges and ethical dilemmas. Our ability to discover 
new candidate genes now mandates that we develop 
robust functional experimental approaches to confirm 
that these genes are truly pathogenic, often the most 
challenging aspect of gene discovery in practice. At the 
same time, we are faced with new ethical considerations: 
the consequence of screening entire exomes and ge- 
nomes means that unrelated/coincidental genetic disease 
may be identified. This raises complex ethical questions 
in relation to whether these findings should be commu- 
nicated to the patient/family. This important issue in 
relation to "return of results" is reviewed elsewhere [54]. 
In the field of autoinflammation however, irrespective of 
these important ethical dilemmas, these new technolo- 
gies have undoubtedly propelled us into an era of in- 
tense gene discovery and greater understanding of the 
immune system and its regulation that will rapidly be 
translated into better diagnostics and therapeutics for 
these patients. 
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